Large Vocabulary Search Space Reduction Employing Directed Acyclic Word Graphs and Phonological Rules

نویسندگان

  • Kallirroi Georgila
  • Nikos Fakotakis
  • George K. Kokkinakis
چکیده

Some applications of speech recognition, such as automatic directory information services, require very large vocabularies. In this paper, we focus on the task of recognizing surnames in an Interactive telephonebased Directory Assistance Services (IDAS) system, which supersedes other large vocabulary applications in terms of complexity and vocabulary size. We present a method for building compact networks in order to reduce the search space in very large vocabularies using Directed Acyclic Word Graphs (DAWGs). Furthermore, trees, graphs and full-forms (whole words with no merging of nodes) are compared in a straightforward way under the same conditions, using the same decoder and the same vocabularies. Experimental results showed that, as we move from full-form lexicons to trees and then to graphs, the size of the recognition network is reduced, as is the recognition time. However, recognition accuracy is retained since the same phoneme combinations are involved. Subsequently, we refine the N-best hypotheses’ list provided by the speech recognizer by applying context-dependent phonological rules. Thus, a small number N in the N-best hypotheses’ list produces multiple solutions sufficient to retain high accuracy and at the same time achieve real-time response. Recognition tests with a vocabulary of 88,000 surnames that correspond to 123,313 distinct pronunciations proved the efficiency of the approach. For N = 3 (a value that ensures we have fast performance), before the application of rules the recognition accuracy was 70.27%. After applying phonological rules the recognition performance rose to 86.75%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Speech-Based Human-Computer Interaction System for Automating Directory Assistance Services

The automation of Directory Assistance Services (DAS) through speech is one of the most difficult and demanding applications of human-computer interaction because it deals with very large vocabulary recognition issues. In this paper, we present a spoken dialogue system for automating DAS.1 Taking into account the major difficulties of this endeavor a stepwise approach was adopted. In particular...

متن کامل

Integrating contextual phonological rules in a large vocabulary decoder

This paper presents an approach to the integratation of contextual phonological rules in the beam-search algorithm of a large vocabulary speech recognition system. The main interest of contextual transcription rules is that they implement constraints on pronunciations sequences which complement the bigram constraints on word sequences. As such, they should help avoiding acoustic confusions and ...

متن کامل

A general algorithm for word graph matrix decomposition

In automatic speech recognition, word graphs (lattices) are commonly used as an approximate representation of the complete word search space. Usually these word lattices are acyclic and have no a-priori structure. More recently a new class of normalized word lattices have been proposed. These word lattices (a.k.a. sausages) are very efficient (space) and they provide a normalization (chunking) ...

متن کامل

Word clustering effect on vocabulary learning of EFL learners: A case of semantic versus phonological clustering

The aim of this study is to determine the effect of word clustering method on vocabulary learning of Iranian EFL learners through a case of semantic versus phonological clustering. To this effect, 80 homogeneous students from four intermediate classes at an English institute in Torbat e Heydariyeh participated in this research. They were assigned to four groups according to semantic versus phon...

متن کامل

Large Vocabulary Speech Recognition in English and French

In this paper we report efforts at LIMSI in speaker independent large vocabulary speech recognition in French and in English. The recognizer makes use of continuous density HMM (CDHMM) with Gaussian mixture for acoustic modeling and n-gram statistics estimated on text material for language modeling. Acoustic modeling uses cepstrum-based features, context-dependent phone models (intra and inter-...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • I. J. Speech Technology

دوره 5  شماره 

صفحات  -

تاریخ انتشار 2002